70 research outputs found
ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment
The objective of stylized speech-driven facial animation is to create
animations that encapsulate specific emotional expressions. Existing methods
often depend on pre-established emotional labels or facial expression
templates, which may limit the necessary flexibility for accurately conveying
user intent. In this research, we introduce a technique that enables the
control of arbitrary styles by leveraging natural language as emotion prompts.
This technique presents benefits in terms of both flexibility and
user-friendliness. To realize this objective, we initially construct a
Text-Expression Alignment Dataset (TEAD), wherein each facial expression is
paired with several prompt-like descriptions.We propose an innovative automatic
annotation method, supported by Large Language Models (LLMs), to expedite the
dataset construction, thereby eliminating the substantial expense of manual
annotation. Following this, we utilize TEAD to train a CLIP-based model, termed
ExpCLIP, which encodes text and facial expressions into semantically aligned
style embeddings. The embeddings are subsequently integrated into the facial
animation generator to yield expressive and controllable facial animations.
Given the limited diversity of facial emotions in existing speech-driven facial
animation training data, we further introduce an effective Expression Prompt
Augmentation (EPA) mechanism to enable the animation generator to support
unprecedented richness in style control. Comprehensive experiments illustrate
that our method accomplishes expressive facial animation generation and offers
enhanced flexibility in effectively conveying the desired style
ResLT: Residual Learning for Long-tailed Recognition
Deep learning algorithms face great challenges with long-tailed data
distribution which, however, is quite a common case in real-world scenarios.
Previous methods tackle the problem from either the aspect of input space
(re-sampling classes with different frequencies) or loss space (re-weighting
classes with different weights), suffering from heavy over-fitting to tail
classes or hard optimization during training. To alleviate these issues, we
propose a more fundamental perspective for long-tailed recognition, {i.e., from
the aspect of parameter space, and aims to preserve specific capacity for
classes with low frequencies. From this perspective, the trivial solution
utilizes different branches for the head, medium, tail classes respectively,
and then sums their outputs as the final results is not feasible. Instead, we
design the effective residual fusion mechanism -- with one main branch
optimized to recognize images from all classes, another two residual branches
are gradually fused and optimized to enhance images from medium+tail classes
and tail classes respectively. Then the branches are aggregated into final
results by additive shortcuts. We test our method on several benchmarks, {i.e.,
long-tailed version of CIFAR-10, CIFAR-100, Places, ImageNet, and iNaturalist
2018. Experimental results manifest that our method achieves new
state-of-the-art for long-tailed recognition. Code will be available at
\url{https://github.com/FPNAS/ResLT}
Generalized Parametric Contrastive Learning
In this paper, we propose the Generalized Parametric Contrastive Learning
(GPaCo/PaCo) which works well on both imbalanced and balanced data. Based on
theoretical analysis, we observe that supervised contrastive loss tends to bias
high-frequency classes and thus increases the difficulty of imbalanced
learning. We introduce a set of parametric class-wise learnable centers to
rebalance from an optimization perspective. Further, we analyze our GPaCo/PaCo
loss under a balanced setting. Our analysis demonstrates that GPaCo/PaCo can
adaptively enhance the intensity of pushing samples of the same class close as
more samples are pulled together with their corresponding centers and benefit
hard example learning. Experiments on long-tailed benchmarks manifest the new
state-of-the-art for long-tailed recognition. On full ImageNet, models from
CNNs to vision transformers trained with GPaCo loss show better generalization
performance and stronger robustness compared with MAE models. Moreover, GPaCo
can be applied to the semantic segmentation task and obvious improvements are
observed on the 4 most popular benchmarks. Our code is available at
https://github.com/dvlab-research/Parametric-Contrastive-Learning.Comment: TPAMI 2023. arXiv admin note: substantial text overlap with
arXiv:2107.1202
Pressure-induced spin reorientation transition in layered ferromagnetic insulator Cr2Ge2Te6
Anisotropic magnetoresistance (AMR) of Cr2Ge2Te6 (CGT), a layered
ferromagnetic insulator, is investigated under an applied hydrostatic pressure
up to 2 GPa. The easy axis direction of the magnetization is inferred from the
AMR saturation feature in the presence and absence of the applied pressure. At
zero applied pressure, the easy axis is along the c-direction or perpendicular
to the layer. Upon application of a hydrostatic pressure>1 GPa, the uniaxial
anisotropy switches to easy-plane anisotropy which drives the equilibrium
magnetization from the c-axis to the ab-plane at zero magnetic field, which
amounts to a giant magnetic anisotropy energy change (>100%). As the
temperature is increased across the Curie temperature, the characteristic AMR
effect gradually decreases and disappears. Our first-principles calculations
confirm the giant magnetic anisotropy energy change with moderate pressure and
assign its origin to the increased off-site spin-orbit interaction of Te atoms
due to a shorter Cr-Te distance. Such a pressure-induced spin reorientation
transition is very rare in three-dimensional ferromagnets, but it may be common
to other layered ferromagnets with similar crystal structures to CGT, and
therefore offers a unique way to control magnetic anisotropy
- …